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ABSTRACT • 

This reported research centered on development of the 
concept of a translating computer interface by which the networking 
of heterogeneous interactive information systems may be achieved 
during the period in which information retrieval system and network 
standards are evolving. The particular concepts and techniques 
investigated are' the. virtual system concept, a common command 
language, a master index and thesaurus, and. a common bibliographic , 
dat>a structure. In addition to the theoretical study of the problem, 
cin experimental interface ha£ been developed that connects the 
MEDLIltE and Interex retrieval system via ARPANET communication links 
and that performs som£ of the networking functions of the virtual 
system. (Author/WH) 
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(j\ ABSTRACT 



This report describes results of an 18-month 
research effort in the couoling of interactive information 
systems. The research has' centered on development of the 
• concept of a translating computer interface by which the 
networking of heterogeneous interactive information systems 
may be achieved in the period during which I-R system and 
network standards are evolving. Particular concepts and 
techniaues which have been investigated are:. (1) the 
virtual system concept by which users perceive the network 
as a single homogeneous system; (2) a common command 
language synthesized from a basic language of primitive I-R 
functions; (3) a master index and thesaurus which stores the 
vocabularies of the separate data bases along with index 
term interreiat xonsnips and counts; and (4) a common biblio- 
graphic data structure in which the data elements for 
bibliographic information are hierarchically structured and 
interrelated among different data bases. In addition to the 
theoretical study of the problem, an experimental interface 
has been developed that connects the MEDLINE and Intrex 
retrieval system via ARPANET communication links and that 
performs some of the networking functions of the virtual 
system. 
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I. INTRODUCTION 

This report describes results obtained under National 
Science Foundation Grant GN-36520, entitled 'Research in 
the Coupling of Interactive Information Systems 1 . The 

period of the grant was January 1, 1973 through June 30, 1974 

j 

The research was motivated by our concern that the 
information systems that are beginning to appear in * 
operational environments will not be effectively utilized 
because of an inability ot users and information specialists 
to master them. Degradation in the quality of service is 
likely to occur because of the many differences that exist 
among the systems and the time required to gain an 
understanding of their specialized features. 

Under the present state of affairs, in which each 
ini ormat ion system has Its own unique features and is 
accessed Ln accordance with its own set of special rules, 
Lt Is out oi the question to expect users themselves to 
wake ellicient urte ol the various systems. There are Just 
too tunny subtle procedures to master in a short time, We 
have even observed t ha t cons i der ab 1 e i r a i n 1 nj» ot ini ortna - 
t i on spec 1 a 1 i s I j* t ?* required to hrin^ l hem lo a hi^.h level 



of proficiepty and we foresee an upper limit to the 
number of disparate ly designed systems the specialists 
will be able to handle. 

It is clear that until such time as the user 
community becomes highly proficient in online-access 
procedures, information specialists must be on hand t<^ 
serve user.;needs. Although it may turn out that a certain 

percentage of users will always want to delegate search 

J 

responsibilities to the specialist, one would like to move 
in the direction pf self -service as a means of getting 

the user involved directly in the solution of his own 

* 

informational problem, thereby reducing 'his costs gnd 
improving personal satisfaction. This will not be. 
possible unless system access is made simple and straight- 
forward. 

Hence, future courses of action become obvious; 

( 

either uniform standards must be adopted for all systems, 
or computerized interfaces must be developed which 
accommodate nonunif ormit ies and re-present them to the 
user in a single, standardized fortfi. It is along the 
latter line that we haVe been working. 
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II. DISCUSSION: THE NEED FOR A NETWORK OF HETEROGENEOUS 
INFORMATION SYSTEMS 

A. Re cent Advances In Inter active Retrieval Systems . 

A number of interactive bibliographic information retrieval 

* f 

systems, have been developed in recent years. This type of 
online computer system has been widely acclaimed by users 
for rapid and easy access to large data bases of bibliographic 
references. The economic viability of these systems is 
attested to by their continued growth*and by the fact that a 
number of commercially sponsored systems are currently 
available. In fact, it is now possible to gain access 
from most points in this country at costs ranging from about 
$6 to $100 per connect hour to these retrieval systems* These 

A collection of descriptions of several of these systems 
is found in Walker, Donald £. (ed) , Interactive Bibliographic 
Search: The User /Compu t er Interface , AFIPS Press, Montvale, 
N* J. , 1971 

Economic viability is di&euftsed in the following two 
papers.: 

C.W. Therrien and J.F. Reintjes, Modeling of Informa- 
tion Sy s terns ; P roce edi ngs o f the Sixt h Annua 1 Pr inceton 
Conf erence on Inf ormation Sciences a n d Systems , Pr ince t on 
University" March, 1972 

Davis B. McCarn and Joseph Leiter , On-Line Services 
in Medicine and Beyond, Scie nce , Vol. 181, No. 4097, 
27 July 1973, pp 318-324 
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systems contain in the aggregate references to documents 
numbering in the millions in such subject areas as 
chemistry, aeronautics and astronautics, education, agricul- 
ture , 'nuclear science, toxicology, medicine , engineering, 
and environmental studies as well as data bases covering 

» < 

r 

several subject areas for such document types as journal 
articles, government- sponsored reports, Library of Congress 

4 

cataloged monographs, and news articles. 

B. Limitations of Present Systems . A major 
limitation of current systems is the size of the data base 
that can be stored online . A data base containing biblio- 
graphic information for a million documents is about Jihe 
max^jtmum size for effective online operation with current 
h^dware /software environments for single computer systems. 
However, a collection of this size represents a very few 

t 

documents when measured against the total amount of pub- 
lished literature. In particular, a million documents 
will cover the literature of a single discipline for' 
example, chemistry for only a very few years. 

One might argue that most researchers ^work only in a 
fairly narrow area and could be adequately served by a data 
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base of the size of a million documents. There ar<^ . 
several problems with this kind of argument. In thfe^ f ir& 
place, the information needs, of users are often most 
critical in areas outside their own specialty . Thus, for 
example, when starting out in a new aspect or when seeking 
an experimental device for instrumentation, a -researcher 
may have more need for information than when^working strict- 
ly in his own specialty. In Project Intrex we found that 
there were many more serious* users who were from outside 
the 5 specialties for which the data base was selected than 
there were from within those ^specialties . Also, there seems to 
be a growing trend toward interdisciplinary activity with 
the concomitant need for multiple data bases. Even users of 
systems with many hundreds of thousands of documents regular- 
ly ask for a broader coverage. Then, too, much of the use 
of these systems is by information specialists acting as 
delegated searchers; these specialists may have many clients 
from different subject areas and, hence, a need for a 
multiplicity of large data bases. 

Another limitation on current systems is their 
capacity in te rms o f n umber of simul taneous online users . 

See Project Lntrex Semiannual Activity Report • 
15 September 1971, Massachusetts Institute of Technology, 
pp 6-«. PB 202 U60. 



This capacity is usually numbered in the tens Whereas 

^ - 
there is a potential f<3r thQusands of simultaneous users, 

" « 

even if only the United States is considered. 

c • Tfte Ul timate Uniform Netw ork Solution / Ul t ima t e 
, the solution to these problems may necessitate the construe 
tion of a large-scale, on-line information retrieval net-. 

work macje up ^of many similar. preferably identical 

computer nodes, each node being associated with ar\ online 
^^d^ta base of A millioft or more documents on a separate set 
of topics. For maximum efficiency users connected to each 

node there might be several hundred online users at 

- any one time" woyld make requests in a common retrieval 

language. Such bequests would liead to parallel searching 
of '"fcjhe appropriate data bases whi^h would ^be^ orginized 
within a standard file structure . ' Intercommunication among 
computer nodes would be accomplished over high-speed * 
communication lip^s for^jrtiich data-concentrator tec^Jiniques 
would be employed to gain further efficiencies and to 
reduce response times. * 

Thus, in order to achieve economies of scale, 
with data bases created only once and used many times by 
a large user community, and to : provide easy transfer of ' 

ERIC ( v 
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information among this large community, the ultimate 
solution* appears to be a uniform network of' standardized 
parts. The telephone and Railroad (standard gauge) 
networks would seem to provide good analogies. 

i . + 

-D . Obstacles to Immediate- Implementation of the 
Ultimate Network , For the next several years , however , x the 
degree o£ standardization required for the ultimate network 
is unlikely Jin view of the \lready heavy investments that 
have been made in existing heterogeneous, nonstandardized 
retrieval systems. Lack of standardization is a A pervasive 
barrier to inter commiinication. A potential user of dif- 
ferent retrieval systems i£ faced with a series of obstacles 
right from the start : the .necessity to discover £hese 
systems in the first place, to make separate procedures t6 
gain access and account for costs, arid*, quite possibly, to 
make actual Access via different terminals and -separate 
locations. Other obstacles face the user once access is 
made: different command languages, retrieval functions, 
indexing vocabularies, and output" formats . If the pro- 
grammers of one system wanted „their system to communicate 
directly with another, they woul<^L face problems of dif- 
fererit operating systems , hardware, programming languages, 

v ' .. 
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character codes, word/byte/biu organization, file 

organizations, and most directly for the majority of 

I-R systems, no established computer- to-computer 

communication links. 

/ 

Because of the established character of the 
different I-R systems, their environments , and their user 
clientele, and the cost of remaking 'data bases in dif- 
ferent file organizations even if permission • to do so 

were granted-, it seems unlikely that any. existing I-R 11 
system and environment will soon become a de facto standard, 

* ' r 

E.' The Computer Interface . In view of the 
foregoing obstacles to the immediate implementation of the 
ultimate network, we decided on a course of action 
for the intermediate term which seeks to approximate the 
effectiveness and efficiency inherent in the ultima^ 
"work as best as possible through a currently achievable 
network based" On computer - inter face techniques. Such 
an interface achieves compatibility among systems of 
he terogeneouc hardware ond software components through 
translating and conversion algoritluns. It is this kind ol 
interface that we have investigated and are reporting upon 
here. 
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The over-all objective of our work vae to establish 
through analysis and experiment the feasibility (or Ln- 
feasibility) of a computer-s tor ed. common language interface 
for dis>parately designed information systems. Our program 
was conducted under three major ^leadings : 

Study of I-R Systems and Research Planning 
Advanced Network Research 

Experimental Interface Dccigr. t Implement at Ion 
and Analysis 
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III. RESULTS OF THE PROJECT 

A . Study of I-R Systems and Research Planning . Our 
effort called f or an ex ami n a t ton of sever a 1 on 1 Irte I-R 
systems from the viewpoint of the requirements they would 
Iropoac on a computer Interface design. The purpose was to 
find at least one I-R system outside N. I .T*.. which vould be 
suitable for our network experiments and whose administra- 
tors and technical staff would be willing to cooperate In 
these exper ljaent s^ 

We reviewed many ^of the Important I-R sysicaa. 
We participated in the feature analysis of twelve 4ptcr- 
active rystrru? porfurfb^d by t>r . Thcxasa Kartln r-f U lan ford 
#nd attended a three day seminar for this purpose si , 
♦Stanford tftil vcr a 1 1 y . Our review nf these eyftte*u>6 Jed wa 
t o the ( ;tr»( JufiJ<m that our »rl£lnal Idea© fnt a network 
interface t it fctl detailed deve J <j%mcr,t In addition, we 
Identified the HtDl IKfc UtlUviJ cyttcriu i f the Nati<>nal 
l.U)} A) y (-J Medicine as the be 1 1 c^c f<t tie (« cJkudc ac tiiC 
fltftt o y o t c?iti J c^iit t e .! } < ut M J . 7 . v i t b wb 1 c b t < cji^ci 1 tue n t 
Oui jcafrcm: f<t i!.cH*ln fe MJCJll^r Include 



IH Kipt I in, n<»v a 1 t be Uni vci : U v i I r* <Ui< he I n Co J \ S t J r* \ & . 
©t*«1 t>j f tW 1 n J 31 l-c/ < f the Institute f<t C ttufetKin I • a t 1 « Hi 
^1 Kctcs! t ! s,l M ! v a4 t < < Itl if ^ p Hjm ) i » »t < tbCe *t *iuj »e t a 
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( 1 ) NLM c t a f f V9fi v<?ry c. fiopcr <*t Jvp in *J dlnp 

out r^se^rth effort ft. Tbpy provide^ infers - 
1 I on abwjt KJIULINT »n0 rearjy ^.f* 5 ^ t/> t he- 
ps y g t o n* . 

« 

w^b i r h pr«'vl<Jce a < nnt r i»»t fm e * pc r i mept a ] 

9 j***r j-f-aca v}th the- f r - v<--*ln?J <s»t *r i «^-!tr?4^i?^ft ty>& 

lnt r^t rUvfl] *y«t en& of M . 1 . T . 

( "i ) rtr.Dl.IW". hac cctal.litl.cO an evper I went a ] 

r f mne rt i»m to t he A P VM<L'J . t he AP FA nc t v'< T l* . vt •« } < h 

V<ni]<3 pTOVC fiCjpfiil In fwTthdfJtifc "Uf "V>h net 

* J> 

|r> the r mir U c,f r>\;r vf r i* <n\ thjr t ^ wc 
)i| o^fUr^il the h v 1 r v t H ' I t to l?*< 1 wOe the ) c v J c v f { e * } c t 
i r 'H ( «wpjt'*t net vor ire v*h i <r h r<>ij]*l he ufi c <J Jn fn>| r*e | yn ft h 
| P c>|»cr Jiucr,t e 7*h J a ) c v J c vf f r| < n-e d > cva * <1 } r* £ Jn that we 
*} i o < fvc T e cj br«v < iruJO effectively u& e in mil c * pc J » tut n t A 

h<lh AJ?TAKr/J an<1 TTMNO 4 the c tuiiput ei t.clwi )i- if the 
TYMT^IlAJtJ C,c t jm tat i<*t*. Ihtoi ^h vfii« h iiic e t i I I he *j»j|u * t «M 
i ^»cr i at ; ( »ria 1 * m 1 j r »e h * 1 } j < £j & j»h ; < ? c It U v«] a i*p t c-mft e 
acroDtilJe AJ< tU t»*1 AJifAJU 7 va: c j r at Tt 1 t . t^1*f*efc U * 
fte~vhua t » ( M'.N I t i wftj »wl ct^U-i ju-J Dt i«tn < 5 i*nu: > i < a ( C C Jk ) a ti«^ 
i< J *; | ' t < M-c »1 vc t v he 1 £ f w ] i r i art '. si ^ & in :ci t t ; p 

i c>|»ci i Hie t a 1 tie t v< t i< ike. ) ,!>c«1 tui -i c iul h 1 »c } » »• 

1 1 i i icvlcv: < f vat 1 out 1 t« net cu.s vc a ha^ 
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ft <ftp h groups ,»c 7YMMtA#r, l/v VhftfeJ 4 &yaf,en*a D^vc J opulent 
Cor p*n utt rm ( & r>C) . t ho A t r m i f rnargy Cmm] eel on 4 fcattalla, 
t *r> f n r d tJn i v*?yr * i t y , and KA&A 

£ A^Jv^rf-^** Jv*£v>^r4» *e-*e-*r It «r>4 ***** v#>4^* 

our fcrant vaa a > r r»a«5 - haaer} atudy the problem df inter 
rrmn*'tlrifc Oiverae *r>'5 c r ^r A f»h # ' ft } ] y ecpmriU^ 1 nf of «mh 1 «*m - 
retrieval fryetc<*t»c A paper r}a e r r ih 5r*fc the early result ft 
ha* heer* |ircpatcr5 r»H Ul tcfri;]t t VO a al*«? j;rftrct)UO at 
the pietvf*rfc Jr>t er r rjr*r*er t > fimr>c J af the #r*r*i/*3 tju^cl r«f 
the Aoae r £ r *r> «n< Jet y f<r J r- f mat \ 1m m ]*r»< f rm 
(u tot-cj )WU at l.f*a Azalea, C»3IfMr<U M J fch J J ^h t * r,f 

t he fre 1 c » ij 1 1 • a I * ven he 1 « »v • 

] ?"he CrHiawn C<n«jrvit e| ]r*t cf f *< e 

la&Jc j»K ! , ] «>tt -^ir fc-f 3 t-v { e * < « «r*r»e < t tr*e. 



t» J € J .ai «1 ^ KfcS 1 < . A ar.e» "i a t J Cjcj"'i*ut e 1 It. 1 a * J a< e J d 
J tve 1 vvt 1 I lie t o t £tr,c i«wt "t,1 ti fi< t 1 vc t** f « } tua t 5 « n> Ke t | * c #♦ a 1 
^ v a t Citii© ^ Tt 4 »« a- e tl ; r> f<i Ific J i, i e t f a< e Me*e t 1 *i£ f < i T$ « ' OWi 
ii.lt ^ l,ftt^ufl^s ftt.il I nf < I tiia i I t *ri tel ) Utfil ft i> venule* r 3 •/? ^ , 

UUlift «!sitfcV M*t rlenil ( r.iM}< «H<r< In t* e j*af at I cm) ,Ulni 
1 v : I'tuit rj tt' h t **e /. s t * « ;ati<»?i fti ( t tnif -lit j t*£ hU*<hi*»cjv i.j«e«Jal 
Jtvl elect li c u| t **n rt < fcr oitm-i ri^. i.sn^ua^ct (McriAft) «rt«1 Jti 

(rliMfll ier, ^tl I .C VSJ (^Jujfi) »ti<1 t he It»tlUu(c f«| t *m»]MUl 

*.< e fu e t ftt*«1 ! e c 1 < £ v , tv a t 1 1 >t,o I T* uxi c e u t *I S» I ati<lai *1 a ^ I 1 5* . 

I »e j - a I f »tic r ,T < f C i titnitc t « e 
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l.r-U)r,^nc.'uP > nf r.f t > * n ? d ? jtvfl) eyefcu»c «»ay he I cyrvO 
the- / * itfpfw n r < >?r.j m iU| j r - 1 c ) f 5 ' c a> « * rt^rj t c 7 C y c t ctij f # ,7 

t { f c 1 j f.j. f he. t 7 pr» c J 3 t i ' t»c ne. f cceprv * ' p 1 j "V a» ucct \ # . 
av< « tec n»-»l t i »} i vp? cc ] e yc f ert#e > r» a <• rtrtiuiftri f 7 2 rue y*< r i* 

Vcc F > fc? 1 ( • 1 J 7 c t i c t a>rM Uucnt^ have tcr»'/c] t'< 

; ; < 1 t> t ; r. ^ t . 51 r»c { * 7 i t c c ] & \ ■< 7 * t I * t, &r»<l t-Mcr.c i f m ] t, 1 

; f f < w 1 M ! !.c < • tutu** « . jr.tcf f »r c c f « « a > J r ] f.fcvC I f.t 

; M £ »c > ! V ' f ^ \ 4 1 t *• »0 1 c } * 1 € u * that » c i t tfvu.).] 5* ; i*c 5. 7 

1 a t } r.£ 1 c : vM cMh t * t nic? v # * h a ] ] t he r^cr c ?t&t V 

• ; ] | t let * f a ich ',(v»j c \ c t err ;>c f 0 } ] e c j_ 1 vt r, 

■.;r-'"ej ( f.c ■ - a 1 Z c .£ r • Miip* r»e r. f £ * / t f •€ if.( c / far e ]y J e f c 0 he 

^ i r . * a ♦ * 3* i I 1 c t < wr*e * t , < i.r.rl he t y< j I* C « itinn ;r. ; < - t ; \>'-f 

Jt.l t 1 i i?ririt» 1 i nt, t 1 ! c vc t a ] * 1 iuii*> , t c i £ vac 
. ti. ■ ; c me r . t e »"! c * : »c 1 , :ne r , t 2, '» \ \ \ r . a* \ I u. v i c *J ViV I h ) « »• . 1 h ft J , e ) alu 

V 
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fully in ,')pc t ion C. Tbp r)pvr 1 n prompt of rcwir* 5 advancrdl 

c orjnnjn i c a t i onp r hanne* 1 g pv^it e a rftor p ownpl » t «? pxpon i t i on 
of function* to hp pfeTlorT»od by thp intprfatp. 

1 . CfnirprTT) C«7W?iwni1 J.An^UA£P 

Our fitu'ly of t he r ciurnan'] 1 an^uafe."* f ftf vera 1 
J nf or r;*a t i on r e t r i pva 1 fevstpmp and for a <. <nw>r/n interface? 

/ 

Jfln^J^K 0 led to a nwwhe* of important, ii atlll 

tentative, r or>r iuft lima : 

9 . K e t r i c v a 1 t ' wtwnfJ ft Septra] )v « o;fcp r Up !;j n>'J 1 p ft o f 
i nd i v 1 dua 1 f vine t i on* . * 
# 

b. T*> arply?c the r man**) r» <* c ^de^iAa t p ] y , then we im/et 
chv^^tc t he many j?)i«w?iVf f tstu f # <rtic Menu vrlk%c h 
(IcJinat v c tiittsuaftdfr t »r. he svot ! x c- S 5 c 0 a& ttiar t , 

r . Mcwrvci 4 Jot ufrcj » or.vrtiUtK c it i s> net cttfl/r y that 
< tint ua Otis he mattes lit her than h«?JvJ«3udl f>i j-zfcl t J vc 



f k Ut 1 t.fc 1 itilfeuatid 1 stifc uaj-e a 4 the- tyti^w ft»t 

,'h/c li t Kc v ate- ititcn*led 4 a? c ^ c 11c ) a ) } v far i f < m* 



*1 . hi 

vh^c h ( he v ate- S tit p r»«lc «1 4 au- £ cue* ally Sat i f tnu 

c o:i;j»letc i t ( ii»i|t) elicntive li. t Lc ti.m.hct < 1 5itt.i t 1 or.: 
they « » j»c ) f « * J iii 



; I'd 1 I t ! al'»«vt , * t 4 £ e nc J * 1 1 \* 1 iL^u C & i ! 1 c it 

A t<> < < 1 tn« t c t tt::i:i,amlt io 1 ftnj.iPt'c T* 

f t, * I © t ; of* ? v * * ciu.r a o»l I l.cii it mt:i»si o<1 1 : 1 1_ 1 ia t c; a J e 

t c t ^vSTo {.t cat c] t i iiuf! e hcj.t 1 \*c/i .r c- : * c ninn* n.s 1 i 1 

ZLitu tit : v J I c i»£ 

Ncc :«olhMi C hcli»w fe>) futthel ahalrele 
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g. It is a conroon mleconcept ion that the difficulty In 
translating among command languages results lv6m 
the different names given to the functionally similar 
comrAands and their arguments in the different 
systems. As suggested above it Is,- rather, the 
'diversity of ^function bundles" and the incompati- 
bility of exact translation that are the prime 
difficulties. It is, howsvrr , convenient for tho 
user to be abl*> to reassign command names into more 
familiar or easy - to-remcu*ber labels. This is one 
requirement of an inter faco language which, in effect, 
allows many dialects within the common- language 
framework . 




Therefore, it is prudent to rnerarch ccr*msnd languages 
»hmH the fnllcrwit^g lln^b: 

i Analysis of retrieval c cmffcands in order to 

dissect th^m into their primitive iunc lions. 

if. I>cvo]opapnt of a conform command language 

c consist ing <jf outer oa of (hope primitive functions. 

iil l'j»i*hAol* ntx f cc hn } rj u c e for assisting uecrs Jn 
a i t nat V*»r> Cft of fccncral 1m <m*yyai iM 1 i t y and in- 
cuac tn^as of ttanaUt icm. ae well as on general 
utcj q/i <} a fctt caoicr eyot cit) woe. 

I n I l I a 1 c=mpha a J o on a c r c & a to* t he J ne tvnj k 
I lit(Hj{;h the r c mfxu m language t a( her than t lu ough 
the langus^ee of stiV tif the several \ to fcysletko 
t henna c 1 ve e . 

(tpen e tide due a a I l^kll} Ht v and modularity in 
! he (leaigrt? of (he command lan^ua^e io tkt ii oMc 
in the ii y*i otti 1 c etiv J i i nuua n t of t o4a y ' a J It 

a x e t t erua . 

#■ 

hit ct J at ltij^ f^ct veen t*ivctce ]nde*lti£ V<»c almld? Ue 
1 1 1 <»wt eat ly effotltt we I live at leafed the notion 
\ < I u: ; nj. the I nl i c * ( i cc vi>< fiiuibj v id ht; i ij^c & \ • 1 | il a ae 
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decomposition and stemming to alleviate the dlfflcultleB 
of searching data basea that have been indexed under 
different controlled vocabu lar ten . During the renea^rch, 
thlR notion wae extended to comprise the concept of 
the Master Index and Thesaurus (MIT). The MIT contain* all 
the index and Ihp&ourtiR elerpentR of each of the data baaer? , 
including an ordered Hat of all vocabulary terrnn u&ed for 
indexing together with the counts of the number of docu- 
ments indexed by each and the t heaaurun relations lor each. 
In addition, ^hrough u&e ol the technique ft of phrase dr - 
c opposition (that 1p, breaking a phrase down into it* 
individual words; and at ensuing (dropping word cndlngp uv au 
to consider only the. word tteufcfc) we can automatically identi 
ty :m.i6t intctvor »ta)Uf y j'c 1 a t i onfeh i p<s in addition to t hx^ 
lil'vituie identity l 'c \ a I i tsith i p . 

{'he tiirii,a:ut:n 1 ot tcadilv ctoring and rcirrrlng 
it thctc :c la{ ifti: *t t< pi fvidc in titer fta&tci lndc?K and 
II ic c a ut us 1 cl cj ci it ct t i 1 a 1 1 i t ulc^. viu aim 1 at v t e i ma undei 
cat!, void t ( c :n that ajpcaj t in that 1 c i :c , (Nrc )" 1 y . '2 ) 
liiu: . I » ;1 c * »mp ] c the NANA Icitu c in { } j < a 1 i nsu 1 at i on i ~ 

IvA: -A n.cftaul .j: Alphabet 1 * a 1 Ltpdatc. ,<cr pt r si4 c J 1 > 7 1 
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is automatically found to be related to the following 
TEST* terms in the way specified: specific to e lectr icity 
and insulation , generic to electric insulating papers , 
synonymous with electrical insylators and, of course, 
e lec tr lea 1 Insulation , and otherwise related to e lec trie 
f ie Ids , electric sparks , and therma 1 insulation . 

The MIT can be used to help determine which terms 
to search under and, indeed, which data bases to search in. 
The MlT can be used either in a purely automatic mode ar in 
a nianual or prompting mode in whLfch the user is given a 
display of terms to search under. Interestingly, the MIT 
should be a useful adjunct even for a single system with a 
single data base having a controlled vocabulary. lr\ sh<5rt, 
the MIT concept clearly has Important potential in the 
development of I-R networks 

5 . Bibliographic Data Elements and Structures 

In our research we have determined that another 
prime consideration in the development of means for users 
to interact conveniently with different data bases is 

* - ' ~ 

Thesaurus of Engineering and Scientific Terms, prepared 
for U.S. Department of Defense by Office of Naval Research 
Project LEX, 1967 

irk ' 

See Section C.6 and the appendices for additional details. 
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the interrelation of the diverse d^ta elements and 
structures firom those data bases. Three ways in which 
the interrelations are impjtfrtant lAay be enumerated. First 
searching is done ott one or more data elements:, in order 
to translate a search done on ope system into another, 
the correct correspondence of data elements must be found. 
Similarly,, user output requests ffequire the specification 
of combinations of data elements from the catalog records. 
Finally, in order to combine retrieved document sets from 
different data bases and to creape ^eajrchable document 
sets from separate* data bases, we need to: identify when 
document references from different systems refer to the 
same document: establish common reference foripats: and 
create cbmpon iridex (inverted file) and catalog data 
structures. 

. „ . Our basic solution to this probleip^is the 

cojicept of a common data structure based on the identifica 

r / 

tion of data primitives or basic data elements analogous 
to the basic component functions of the dommon command 
language. Compound data elements in any system can then 
be translated into, or composed from, combinations of 
basic data elements in the common data structure. The 
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b&sic data- elements would be hierarchically arranged 
into a 4'a ta structure and, typically., the compound data 
elements of a system would be equated to a higher levej 
node of the common data structure... 

At the* highest level we have subdivided the 
common data structure for bibliographic data in(fo 

seven major categories. An inicial breakdown of one of 

" ■ ' • " • ** ' / 

these, the Abstract- Indexing-Cont^ntS category, is shown 
in F*ig. 3. There are 21 basi^c data elements identified 
in this category with 14 higher level hierarchical • 

groupings. Note that the abstract sentences, which are 

• ■> 

included under the abstract grouping, are separated out 
individually to make it easier to use this information in 
subject indexing. 

Three additional major categories which have been 
similarly analyzed are Titles, Name s- and -Re la t ions , and 
Related -Document -Citations as shown in Figs. 4-6, respective- 
ly. The other three major categories which were identified - 

,t * ■ 

but not fully analyzed are DescriptLye Document Features, 

Library Systems Holdings and Shelving, and Control Fields. 
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\c )r ( I trr>c f { two ci;t*cnlly ' onftcr tcrj ] U ft ye I c :t»2, J or 
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*?elc< I r I t he- 1 t he i < rrxrnc rti CXiJilT language c^r the 
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pj cvh>u6 trat ih J ct-vjc&l . 
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i J j ^amc a file- i nf ' vl.jr h I l.c t tc ij] I c # » f the 

pf*f-rvc print 7t#jnccf r pri lx e to? eel f».j c i>|i - 
c c »^ i c n t vi cwiT*fc 

i V I t T f « T tl.i Vi tV'if.f, ' f file: i - t * V 1 < ri j e 1 y c ^ v«- 0 , 

* * in % i i i j . 

/• 

X' I- p^^T * ? r iT?fr r-f t he t[CT | n*fe* and 

t he e 4t;t ijc f#>? a i 0 in r]e| cpninirif: flip jit "pr i a t e 
ttat( h t e- 

v i ^ejo. C\ic t he f r>riirr*5»r#'}c ^ vtri in (a) and (V)) en 

pa In :c]cr I anO & c ar < h t he < t t »e r ] * cyftt. er» * 
w-hlrh i* r wrtfnt J>* t f ^r»c r I ^5. 

)vft f -.that !f*r'n;f;h appropriate r ornhi nat i on ft 

thece c c » a tyudT»rle the rttij].t c of tcArr hing in different 
ftrfct e*r**fc f an I c f ,fj Jcc t ed in a^ r r Wfchon flic 

J'hc dct ai lc() otatt/B of our itup lcuirnt at i on <d' 
I h i ft CON IT J design i e given belotf^* 

J . rhv&*«al ; nt «; t c < nine c ! i ime and Network Couvmmi c at i.on 
(h;r original Intention waa to n*aV.c a connection of 
t he inlet Xacc It) the Int t ex let* leva 1 svfcleiri M.I.I and 

t tiic j e :ac • I e jettieval & e:« . Hot 4^> 1 i c h i n^ t lie <. « mrawn i I a- 
O<*no linVo to do thl* t ed tai-j c cjfoit than ux originally 

ant ii ipalci! Hivcvct , wc finally i»vcnatac sevcjal d i f f i «. u 1 
l Uq a tul we did aihiev'c a r output) i i a t i on& haoe that con* 

r 

&;.ilrsaMv tnlpaa&ed t >\ it uj initial ^oah, In paitiiulai. out 
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ccttirij? rort^in par arr»o t r r & in f >»c TIF f f>r the connection. 

tcrtoinal it then « unnri u<l t hr "u};h n tn. ond data Pet 
f<> the M.J 7. IBM in wh i i h INTHi;/; ip located. 

I'in^llv, a dire<t « < nne r t. i orr is e&tablifihrd brtwrcn the 
tw» <]a| p ^pt c , linjt; i u:npjpt in^; t he c uanun i c at i on link 
!'p|vr-r.n the AJ. PAN!"T TIT and Intrc*. The electrical and 
rr,r,r l^nl c a 1 hardwaie l>y wh i < h the terminal 1?; switched Irom 
TIT to 170 «'o:»Jnjter, arid by which the ivo data r*et& aro 
i r onnec ( r ri together, haft hern termed t he "network 
r rotpovrr box " and war.: dersi^ned and constructed at the 
I 'lrr ironii !»yet.r:nft laboratory. (See schematic digram 
in, » -Ik . 

t 

The ca:nr rripchnn I p«' If* lifted to er* tab 11 fib 
connection bctwrm the Til' and the TYl'U^V-T network by 
merely making I he uciond terminal call be to the local 

r 

TYKNKT satellite icwiputcr rather than to (he M.I.T^ 3 70 

» iwjiutcj 1.1 fchuwld he nut ed that t he network CVOSfiovn* 

! c>>. tbr< hani hat tl»c i legibility to be connected (at 
dii lciml times) it. dllfrimi c oiajnit er r* , Th i h 1 legibility 
it oj itr/puMfliuc in e £ pc l i men I i n£ Willi dilierenl 1 • K 
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(ro 370) (to TIP) 
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In any <case, once any TIP has been connected 
to a non-ARPANET computer, the CONIT interface program 
must establish access to the specified TIP port through 
appropriate calls on ARPANET software. This having been 
done, CONIT can send commands to either of the two 
currently connected I-R systems through the respective 
TIPs and rece/ive responses from these I-R systems over 
these same full duplex- connections. 

Thus, at any one time, CONIT can be 

corpmunica ting with either of the two I-R' systems current- , 

Vi * * 

ly connected: one being the MEDLINE system through the 
NBS TIP, the other being either Intrex or one of the 
TYMNET systems through the CCA TIP. The different TYMNET 
systems can be connected to CONIT sequentially by having 
CONIT ?end the commands to logout the currently connected 
system and then ♦logon a second TYMNET I-R system. .Finally, 
CONIT can switch from a TYMNET system to Intrex, or vice 

versa, by redialing the connection to the CCA TIP f rom \. 

« 

the network crossover box. ^ ^ " 

Iilterf 



3 . The*" Basic Iriterface Programs 



V 

The CONIT-1 system contains as a nucleus a set \r 



of program modules that enable a user to Have an 
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interactive dialog with the computer. These modules 
include, for example, routines that provide fqv message 
transfer between a usee terminal atid CONIT, the parsing 
of user requests into individual command and argument 
strings, transfer of control tp routing that execute 
commands, and the storage and selection of messages sent 
the user in full pr abbreviated ^formats. 

Another set of routines enables and disables 
the ARPANE^T TIP connections , ^s mentioned in the previous 
section. 

A third* set of routines allows for the 

• \ 

translation of user commands into conmands in CONIT or 
external retrieval systems. These routines are based 

V * 

upon the concept of translation tables. CONIT automatical 
ly selects the appropriate translation table when, fqr 
example, the user is speaking the CONIT language to 
MEDLINE or Intrex. Special purpose routines also perform 
other functions required for proper translation tp extern- 
&1 systems; for -example , MEDLINE must receive characters 

tr 

in upper case, and user commands to MEDLINE are so con- 
verted. 
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4. CON IT -1 Command Languagg 

In the design of command language for CONIT-1 
the question naturally arose as to the structure and 
characteristics of such a language. We have inclined toward 
a simple command -name /argument string type format with common 
English words for the vocabulary (abbreviations allowed) , 
very simple punctuation (e.g., spaces) for delimiters, and 
general indepen<\pce of command and argument ordering. This 
structure seems preferred for ease of use especially com- 
pared to a. more complicated programming- type language. 
English, used as a common language, suffers from the twin 
defects of being of complicated structure and being very am- 
biguous; these defects make it hard to explain to users what 
precisely can be accomplished in the system at hand and even 
more difficult fot cne sj^tem to parse the user's request 
syntactically and semantically . 

These views on language structure have been supported 
by our own previous work in Project Intrex and the direction 



See, Marcus, R.S., Benenfeld, A.R, , and Kugel, P., /"The 
User Interface for the Intrex Retrieval System 11 in 
Bi tractive Bibliographic Search r D.E. Walker, (ed) , AFIPS 
Press 1971; pp. 159-201. 
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we see existing systems taking as evidenced, for example , 
by the previously mentioned Stanford seminar. With these 
views in mind, we began the design and implementation t^f 
the CONIT command language. The initial set of commands for 
CONIT-1 that were implemented are listed in Table I. Note 

that in several cases e.g., FIND, PRINT only the 

simplest default conditions were implemented. In addition 
to these regular CONIT commands, if a language other than 
CONIT is being spoken, any command of that language may be 
given; i.e. , a "transparent" mode is possible. It can be 

seen by reviewing the explanation of the commands that 

with one exception all the goals for CONIT-1 as set 

forth in the initial design given in Section {^.1 above have 
been met including the translation of a couple of conmands 
from the common command language into the languages of two 

target systems. (The status of the one exceptioii display 

of the Master Index and Thesaurus is described below.) 

The details of thfe CONIT command language, while 
following the general principles described above, were de- 
veloped in only a very preliminary stage; should our work 
be continued, they are subject to change. Nevertheless, the 
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Table 1. Lint o£ Commands Executable In CONIT-1 



COMMAND 



SELECT x: 

SPEAK x: 

LIST CONDITIONS 

VERBOSE: 

BRIEF: 

SET TABLE x: 
REPLACE $x-y: 

DELETE RULE $x : 

LIST TABLE: 



DISCONNECT x: 
(T) FIND x: 



(T) PRINT: 
NAME FILE x: 

FILE: 
EXIT: 



EXPLANATION 



Select a system x to search Ln and 
enable TIP connections. 

Select command language x. 

List current language and system being 
searched . 

Have CONIT give full-form, instructive 
responses. 

Have CONIT give abbreviated responses. 

Pick translation table named x. 

In current translation table, add the 
rule that the string x is replaced by y. 

Delete from translation table that rule 
with x as the string to be replaced* 

Print out rules in current translation 
table. i 

Disable TIP connection to system x^ 

Do a simple search on x in system currently 
selected. 

Print out standard catalog information on 
documents in current list. 

Select file x (create a new file if x 
does not already exist) to be used to save 
future catalog output. 

Add response of next request to currently 
selected system to save file. 

Leave CON IT and return to MULTICS command 
leve 1 . 



(T) indicates £hat a translation to the selected system is 
required. 
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dec Is Ion -rnak Inp, proceBR for choonlng certain of these 
details may be of some interest. The use of tbr? command name 
FIND for the bafiio search request deserves some comment. 
It in different In form from both of the systems it now , 
translates Into. In Intrcx there arc three different search 
commando (SUBJECT, AUTHOR , TITLE) to specify the different in- 
verted files oi» inverted file subsets (title words arc a 
subset of the subject inverted file) to be searched. In 
MEDLINE the absence of a command name implies a search request 
(just the term to be searched is given), although one may use, 
ot need to use in certain situations, the form "FIND x M . 

Thus, both Intrex and MEDLINE may be chaycterized as 
having, in general, the search command being a default situa- 
tion; i.e., being unspecified as such. Our Intrex experience 
has suggested that users find the command language easier to 
learn and remember when it is more consistent. The default 
command upsets the simple rule: command name first then ar- 
guments. Leaving off the command name does not save much 

s ince an abbreviation as short as one letter may be 

used. It also makes parsing and error checking more difficult. 
Thus an explicit command, one in common use, was chosen. 

It should also be noted that even in this rudimentary 
interface it can b? seen how a number of the complexities of 
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vjrp of multiple diverse systems can bo mitigated. For 

example, the CONIT unet need not know the intricacies of 
p 

logging on to the remote systems only the SELECT command 

need be given; the rest is taken care of for him by CONIT. a 
In the case of regular MEDLINE use through \the ARPANET, there 
is the special problem of accessing through one of 5 TIP 
ports. CONIT automatically transmits thp proper protocols 
and even cycles through the 5 ports, one at a time, until 
it finds one that is available. Similarly, CONIT gives the 
proper log off, or QUIT, message to system x whether because 
of a DISCONNECT x or implicitly required through an EXIT 
irom CONIT. Proper termination of a u$cr from a system is 
important not only for the individual user (e.g., for proper 
charging) but also for other users who might otherwise face 
problems in logging on. 

5 . Implications for Command Language Design 

In considering how to design a common ^^ Eaan< ^ 
languag2 that could serve as an interface to existing 
retrieval systems, we came to a rather paradoxical conclusion 
it is both impossible and not too difficult to translate from 
such a language to the languages of the existing systems. 
It is impossible in the sense described above in Section B.3 
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in that existing systems are usually rather limited in the 

functions they can perform and simply caij^not perform an 

arbitrary request. On the other hand, if an approxitrate 

translation is allowable, it is often not too difficult to 

make a reasonably good one, For example, the simple PRINT « 

command in CONIT can be translated to the "PRINT" coroaand 

in MEDLINE or the OUTPUT command In Intrex with reasfcmably 

« 

close results even though the catalog information printed 
might occasionally vary slightly in the default modes of 
thcoe commands. 

Two ways to aid in the translation process yire 
described elsewhere In this report: the Master Index and 
Thesaurus (MIT) anc^ the common biblographic data structure, 
Two other elements we have found important in the interface 
problem are the identification of search statements and the 
disambiguation of named entitles (commands, arguments, aetfl, 
etc.), as discussed below. 

Firstly, it should be recognized that the name a given 
entitles in the target systems cart be Isolated from those 
names given in the common language, as ltfng as an unambiguous 
translation is possible from common- to- target language. Thus 
the main task is to provide a mechanism lor insuring an 



U2 

unambiguous Bet of named entitles at the common system 
level* (Of course, the naming of entitles, such as 
commands, at the coninon level should consider such factors 
as the advantage of us Ing a 1 ready es tab 1 1 shed wide ly used 

terms c «8«. PRINT.) The common interface then needs to 

keep account of all entitles that are named a priori In the 
common command language or that may be named In the course 
of a given user's session. Such entitles can be generlcaLly 
classified as: 

9 

1. command names 

2. names of arguments to commands * 

3. names given to retrieval search seta 

U. new names given by a user (an through a 

REPLACE function) to any one or combination 
of entitles in the above 3 categories. 

Argument names cover such entitles as tinmen of systems, data 
bases, saved fllfd, catalog clemetrka, vocabularies, vocabulary 
elements, modes of command operation, and so forth. The 
cocinon interface would insure against Ambiguities by checking 
names a user might Initiate against the current list of named 
entities and request \hat the user choose again if a duplica- 
tion is detected. It is possible, of course, to disambiguate 
at times on the basis of context, but it seems simpler and 
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easier to explain system use to a wsrr if all Ambiguities 
arp avoided in thr 1 1 r ft t place. 

The identification and recording of search statements 
and their requite appears to be an ^specially vital task where 
multiple systems and data bases are involved. The elements 
associated with a search that need to be recorded include: 

1. the search statement number (assigned by system) , 

2. an alternate search statctnent^name given by user , 

3* the search statement itself; 

4 . the system(s) and data base(s) for which 
Intended t 

b. the number of references (or documents) 
retrieved, 

6 . the name of search given In any target ay a tra; 

7. the actual list of references retrieved; 

8. for an^r subnearchre necessitated by thin 
uearch . cither at level of common system or 
the target system, all the above items of 
infornxat Ion. 

All lterar* of Information above except, for some sye terns. 

Item (7) --- could be obtained and maintained at^the voinmon 
interface level without requiring modification of the target 
ayo tern. 



* 

See tin? dlficuttttlon in Settloi> C . (\ below of the automatic 
ur*c of the MIT lor an example of how suhtiearchet* may be 
generated . 

ERIC 
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Thp Kaatpr Indp* and n>pfiauru« 



Thp Mastpr Indp* and Thpsaurue (MIT) concept 



described in Section D ^ was developed for Inclusion 
in thp experimental interface. A dptailed dpsign was pre- 
pared describing the specific Information to bp contained 

in thp MIT and thp logical intprre lat ions amonp, thp various 

< 

parameters bo contained. Appendix A describes this initial 
design . 

,m 

How the MIT could be displayed under manual control 
is described in Appendix B. How thp MIT could be used in an 
autoeaal/c way has also been studied. Asstrae a user make a a 
search request ol the i orm FIND ABC, where A, B, and C are 
ordinary English words. The system £ould then search the 
MIT lor all words or phrases that had a word stem tlxat 
matched the siem i or the word A. Call thette terms Al, A2 , 
A3.... Then a search consisting of the union oi searches on 
all these tcroi would be iound, i.e. , A - KIND Al OK A2 
OH A3.... Similarly, and C are iound. The search set 
answering the Initial request would then be the Intersection 
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VorV suggests that this resulting c^t wmjld generally ^Jvc 
v^ry pf f pr t Ivp ecfltr h rpcwlt b 

Of mijTRp, m^ny other roodea of ijcp of the MIT taay be 
detirahlp in particular mnt^Jit fi Including autocuatK selec- 
tion ^/jf aynonyronua and specific related and uaer eel^c * 
t i on frr/ro 1 i ft r of term* found aut omat i c al ly , aft above . 

Program were vrrltten to transform Information from 

the data bs&ep of our two main exper 1 n**nt a 1 ayatea«ft * - - S 

KErDUNL and IKTRHX - Into the Master Index and Tbeftsurua 

organization abovn in Appendix A The Intrex data banc 

wan generated frciro IKS PLC tapec containing bibliographic 

data from thrrr separate dat-3 bases: Science Abatracta, 

Electrical Engineer inf: Abstracts, and Coaputera and Control 

Abstract*. The MIT formed from these data bases contained 

both the free and cent rol led- vocabulary information that 

could be gleaned ircio hot h the INSPEC catalog record* a& 

reformatted l!) t ho Intrex ayatra and the IntreX inverted file 

Q 

reoultlng f r ota inverting certain ol the IKSPIIC fields. In 

) 

particular, the cont rol led-yocabnlary terras ■•- both cU&»i I tea- 

Sec, i) Project Intrex Semiannual Activity Keport nt-12. 
li Septcmbei 1^7 1. Masaachuaet t a Institute of Technology, 
|>|>. 29-/«7 . 

2) Ovethage, C.F.J . and Kcintjes. J . Y . , "Project 
Intrex; ^A General Review/ 1 to be pub 11 abed in 1 nl orma t i on 
Stoifl^c £md Ketlieval. 

It i u of tntcinet to note that the catalog records lor the 
Intrex INSPKC data base were organized in t lie hierarchical 
data structure as deacribed in Section U.i. 



t l<m terras and controlled index terms - - - And their index 
counts are captured as are all f r ee voc abu 1 ary word stems 
and their Index counts from titles, abstracts, and free 
subject expressions. In addition, the listing of controlled 

r 

t rrtas under each of the word s terns they Cont a in was generated 
Similarly, programs were written to generate MIT 

r 

information from MEDLINE data base* In the new MEDLARS - 1 1 
format. This information included the thesaurus information 
glvfn In the MEDLARS vocabulary files. 

IV. CONCLUSIONS 

A - Work Acc ampl i shed . Research on the coupling of 
Interactive Information syatemn has progreaacd in a 
significant way. The research haa f ^cussed on the concept of 
a translating computer Interface by which the networking of 
he t er.ogeneou* Interactive information retrieval oyatema enn 
br achieved. Tli 1 cs concept appeara to be a viable approach to 
the development of I-R netvorka In the Interim period during 
which I - R ey&tetn and network standards are gradually evolving 

Particular concept* and technique!* which* appear through 

analyslo and the design, Implementation, and tenting of ex- 
perimental Inter la^ti --- to be especially uticlul in developl 
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the over -a 11 interface concept include: (1) the virtual 
system concept by which users perceive the network as a 
s.ingle homogeneous system; (2) a commons command language 
synthesized frort a basic language of atomic I-R functions; 

(3) a mas tor index ar>d thesaurus which 6 tores the , 

o 

vocabularies, of the separate data baaes along with index 
term interrelationships dhd counts; (4) a common 
bibliographic data structure by which the dat;a elements for 
bibliographic information may be enumerated , hierarchically 
structured, and interrelated among different data bases. 

B. Recommendations for Further Work . While a basis 
for research into the coupling of retrieval systems has been 
laid, much additional work is obviously needed including the 

' V 

further elaboration of the techniques listed above; their 

implementation in additional demonstration systems which 

connect several systems and several data bases and cover most 

retrieval functions; the testing and evaluation of these 
» 

systems with real users; and the development of more effective 
computer- to-computer communications. Also, we see thd nee4 
for additional study of the relationship of the computer 
interface to the network of I-R systems including such 
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questions as: 



1. How many of the I-R functions should be 
* per formed within the interface as distinct from 

being performed by the separate I-R systems? . \ 

2. What are the technological and economic " 
reasons for treating the separate I-R systems within 
the network as .inviolate /^black boxes 1 -' (tne^|S sumption 
we have made in our research to date), as contrasted 
with the alternate concept that they should be modi- 
fied to interact more effectively with the interface? 

3. * To what extent can and should the common- 
interface concept become a de facfco standard toward ' 
which existing systems'may evolve in order to take 
maximum advantage of networking potentials? 
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APPENDIX A 



DATA COMPONENTS FOR THE MASTER INDEX AND THESAURUS (MIT) 



Each entry in the Master Ind^x and Thesaurus contains up to 
five fields. These fields with their data components are 
listed below. 

A. Header 

1. Is the entry for a single word stem or a phrase? (1 bit) 

2. Entry type (3 bits) (e.g., subject, author, other)- 

3. Name (variable length A/N string) (e.g. , ''magnet", 
"magnetic resonance", "Smith", "Van Pelt") 

■ _ 

4. (a) Number of English woxj&s in phrase (6 bits) . (phrase* 

entry only) ^ v - * * 

, (b) Number of affixes (stem entry only) 

5.. Coxints (document and reference counts for each collec- 
tion — - i.e., data base, for which counts are being 
kept in MIT) 

B. Affixes (This field needed only for word-s.tem entry). 

. This field contains the set of affixes for a word stem. 
For each affix there would be: - 

1. Relative (to this entry) affix number (6 bits) (sort 
endings alphabetically) 

2. Absolute affix code (12+bits) (at present this would be 
the Intrex 12 -bit suffix code ; however , we should allow 
for future expansion to permit prefixing, as well). 



9 

ERLC 



49 



50 



3. Document and Reference counts (i.e., 2n n counts, 

v ' a c ' 

where n = number of affixes, and n 138 number of 
a c 

collections as in A, 5), 

C. Controlled Vocabulary Terms (may be null) 

Thi-sT field lists all the controlled vocabulary terms for 
this entry. 

For each term there would be : 

1. Term number (relative to this entry) (6 t bits) 



2. Controlled vocabulary code (6 bits) (i.e., what vocabu- 
lary the term is in). (sort by ending, this code, and 
then type) . 

3. Type of term (4 bits) (e.g., classification or index 
terms) . 

4. Affix code (5 bits) (needed only for single word 

code comes from B.l). 

5. Status (2 bits). Is this a currently used term? (If 
not, we will eventually include dates when it was 
used also date of initial use ^or recent term) . 



6. Reference counts f^ each collection, i.e., n c n t counts 
where n^ - number of controlled vocabulary therms; note 
n t includes separate item for each vocabulary - type - 
affix combination. 

Phrases With This Stem (For word-stem entries only; may be null) 

For each such phrase there would be: 

1. Phrase number (6 bits). 

2. Term number of phrase (from Field C.l of entry €or 
phrase) (6 bits) 
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3. Type of term of phrase (from Field C.3 of ehtry |or 
phrase) (4fc bits) . 

4.. Word number of word stem in phrase (4 bits). - 

/ 

5. Controlled vocabulary code (6 bits). 

6. Phrase name (variable -length A/N string) (sort 
alphabetically, then by vocabulary and term type) . 

7. Reference counts for each collection. 
Thesaurus Relations (may be null) 

The standard thesaurus relations are given here. For^ each 
relation there would be: 

][. Term code for giveA term (6 bits) (from Field C.l). 

2. _ w Type of relation (4 bits) (e.g., broader term, narrower 

term, synonym, use, used for, related term, 
\ morphological variant) . 

3. Controlled vocabulary for related term (6 bits). 

(Could conceivably be different than for given term 

note could also relate "free vocabulary" to^controlled 
vocabulary) . 

4. Name of related term (variable -length A/N string). (sort 
alphabetically) . t 

« 

5. Is related term a word or a phrase? (1 bit). 

6. Automatic search expansion? (2 bits) should the 

related term be automatically added ta a search when user 
asks for given term. 

7. Remarks (variable- length A/N string) (probably nulp . 
(Remarks in free-form English on nature of -the relation - 
and when it should be applied) . 

8. Reference counts for related term for each collection. 
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DISPLAYING THE MASTER INDEX AND THESAURUS 



Preliminary specifications for the display of the MIT under 
user control are given below. 

The command RELATE is used to look up and display information 
from the MIT. The syntax is 

RELATE A B C X 

The last argument (X) is that word or phrase to be looked up 
in the MIT. The first argument (A) specifies the type of 
relation to be displayed as follows: 

ALPHA- -terms that surround X alphabetically 
PHRASE--terms having worH stems in common with X 
THESAURUS --thesaurus relations for X 

Eventually, some combination of relation types may be permissible 
on one display. For now, th£ default option should probably be 
ALPHA* 

The second argument (B) specifies that only terms and relations 
in a particular vocabulary be considered. Sample values for B 
might be MESH or INSPEC or INSPEC PHYSICS etc. Several 
vocabularies could be specified at once, so that the actual 
syntax is Bl B2 B3...; i.e., the : connector OR is implicit. The 
default condition would mean all vocabularies considered. 

The third argument (C) specifies, analagously to B, the particular 
qollection(s) to be considered; e.g., MEDLINE, SDILINE, INTREX/ 
INSPEC, etc. 'Again, the default condition means all collections 
considered. 

These specif ipations will be further illustrated through examples 
display output for some RELATE commands. Note the default options 
for arguments B and C. The index terms are not meant to be actual 
terms from the given vocabularies. 
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A. SAMPLE DISPLAY OF ALPHABETICAL RELATIONS 



READY 

relate alpha radiation 



(7) 



The index terms alphabetically near the terra radiation are^ . 



listed below/ 1 ^ <MRA> 



POSTINGS^ TERM NO. ^ 



TERM 



(4) 



: VOCABULARY 



(5) 



: COLLECTION 



220 
15 
20 

305 
21 

107 
5 

203 
17 
1276 
45 

410 
13 

192 

-i ~» 

— i. ; 
130 



TY 
TZ 

TA 

TA1 

TA2 

( 

TA3 
TB 



TC 



rabid dogs 
rachet wheels 

rad- 

radical 

radiation 



radius 

radiation 
damage 



radiation ef- 
fects on 
tissues 



MESH; HEADING MEDLINE 

INSPEC EE; TERM INTREX _ 

INSPEC COMP; TERM INTREX 

(English stem) INTREX \ 

sdiline/ 

INTREX 
SDILINE 
INTREX 
SDILINE 
MEDLINE 
SDILINE 
INSPEC EE; HEADING INTREX 
(English) INTREX 

(8) 



(English) 
(English) 
MESH; HEADING 



INSPEC; HEADING 
MESH; HEADING 

MESH; HEADING 



MEDLINE 
MEDLINE 



(8) 



If you want to see terms that follow alphabetically, type relate more. 
To see phrase thesaurus relations type relate phrase X or relate thesaurus* X, 
where X is the name or v term no. (see above) of the term whose relations 
you want to see. <MRAZ> ^) 

REAIJY * 
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NOTES FOR SAMPLE DISPLAY OF ALPHABETICAL RELATIONS 



(1) This is the VERBOSE mode message: in TERSE mode no message 
would be ]given. If the argument X did not correspond to 
any term in the MIT the following message' would be given: 

There is no index term radiation. Qiowever , there 
are terms with the same stem: rad-T] Terms alphabetically 
near radiation are listed below, <MRA3> 

Th^§£ntence in brackets would be inserted when the stem 
for Ctl\e single word) X does exist in the MIT, although the 
full word X does not. In the former case, the display would.be: 

TA radiation (No Such Term) 

(2) The document (or reference) counts are given in this column 
when available. (A stem where no free vocabulary , word 
exists would not have a count.) 

r 

(3) Th£ term number is assigned by- CON IT relative to the term for 
argument X, which is TA 7 . The two terms coming alphabetically 
before TA are given first and labeled TY and TZ. As many 
terms after TA are given so af to have at least 20 lines of 
term display. Note full words under word stems are given 
numbered final characters. If a continuation display is 
requested (relate more) , then numbering is continued from 
previous display. 

(4) Full words under word stems are indented one space. If a term 

is repeated (fV a different vocabulary or different collection), 
its printing is not repeated. For each full word in Field B of 
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the MIT, first the free-vocabulary counts would be given 
(from Field B.3) and then the controlled counts (from 
Field C) . 

(5) The vocabulary is given followed by a semicolon and then 
an indication of the type of vocabulary term (e.g., class 
heading or index term) . The vocabulary and type term 
parameters come from Field C.2 and C.3, respectively, of 
the MIT (see Appendix A). 

(6) These are message names. They could be arguments to a 
SUPPRESS command which thereafter causes their deletion (or 
replacement by TERSE form) or the EXPAlN command, 

s 

(7) Some emphasis on the playback d argument X is desirable. 
Here, for example, radiation could be in a different color 
or capitalized. 

(8) If information under a given column is too long to fit in 
that column, some convention such as suggested here will 
be desirable. 
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B . SAMPLE DISPLAY OF PHRASE RELATIONS 



READY 1 

relate phrase radfaTr^n damage > s 

The index, terms with words whose stems match the- 8tetn(s) 

(1) 



(1) 



of 



rad-latlon (and damag-e) 



POSTINGS TERM NO. : STEM 1 
TA rad- 



are listed below: <MRP> 
(<0 



VOCABULARY 



COLLECTION 



305 
21 
107 

i 

13 



(2) 



50 
192 
217 



TA1 



TA3 



TB 
TC 



TD 



radial 



radius 



(English Stem) INTREX 

SDILINE 
.(English) INTREX 



(English) 
.<*> 



INTREX 



PHRASES WITH STEM' 

polarized radiation INS PEC EE: HEAD INTREX 

radiation damage INSPEC EE: HEAD INTREX 

MESH: HEAD MEDLINE 



: STEM 2 
damag- 



CS tern) 



INTREX 



18 



TE 



PHRASES WITH STEM 
damaged goods BUSINESS 



(3). 



* see term TC above radiation damage 



INFORM 



To see thesaurus relations type relate thesaurus X, where X is 
the name or term no. (see above) of the term whose relations you 
want to see. <MRP2> 



READY^ 
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NOTES FOR SAMPLE DISPLAY OF PHRASE RELATIONS 



(1) Delete parenthetical parts if only one word in X. Null 
result message : 

There are no index terras with words whose steins 
match the stem(^) for rad-iation (or damag-e) . <MRP3> 

(2) The full display for this stem, as in Section A, goes here. 

(3) Duplicate phrases are indicated in this fashion. 

(A) Stem information is given in addition to multi-word phrases 
as a convenient way to indicate what the stemming is for 
the given words in argument X as well as to show single- 
word "phrases 11 . 
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C- SAMPLE D1SPIAY OF T MESA UK IJS RELATIONS 



READY 

relate thoBaurun radiation damage 

The index tcrma with thesaurus relations to the terra radiation 
(1) 

damage are given bclow: v <MRT> 



POSTINGS TERM NO, 

NONE TA 
NONE TB 
2005 TC 
7 5 TD 



(2) 



TYPE 

CODE 6.21 
ypDE H. 50. 31 



RELATED TERM 



VOCABULARY : COLLECTION 



*NSPEC 
MESH 

BROADER radiology -MESH : HEAD MEDLINE 

NARROWER radiation dosage MESH: HEAD MEDLINE 



READY 



NOTES FOR SAMPLE DISPLAY OF THESAURUS RELATIONS 

(1) Null messages: 

i 

There is no index terra radiation damage <MRT2> 

There are no thesaurus relations for the term radiation damage <MRT3> 

(2) From Field E.2 of MIT (see Appendix A) 
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